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Abstract 

We  describe  the  beginning  stages  of  our  work 
on  summarizing  chat,  which  is  motivated  by 
our  observations  concerning  the  information 
overload  of  US  Navy  watchstanders.  We  de¬ 
scribe  the  challenges  of  summarizing  chat  and 
focus  on  two  chat-specific  types  of  summa- 
rizations  we  are  interested  in:  thread  sum¬ 
maries  and  temporal  summaries.  We  then  dis¬ 
cuss  our  plans  for  addressing  these  challenges 
and  evaluation  issues. 

1  Introduction 

We  are  investigating  methods  to  summarize  real¬ 
time  chat  room  messages  to  address  a  problem  in 
the  United  States  military:  information  overload 
and  the  need  for  automated  techniques  to  analyze 
chat  messages  (Budlong  et  al.,  2009).  Chat  has  be¬ 
come  a  popular  mode  of  communications  in  the  mil¬ 
itary  (Duffy,  2008;  Eovito,  2006).  On  US  Navy 
ships,  watchstanders  (i.e.,  personnel  who  continu¬ 
ously  monitor  and  respond  to  situation  updates  dur¬ 
ing  a  ship’s  operation,  Stavridis  and  Girrier  (2007)) 
are  responsible  for  numerous  duties  including  mon¬ 
itoring  multiple  chat  rooms.  When  a  watchstander 
reports  to  duty  or  returns  from  an  interruption,  they 
have  to  familiarize  themselves  with  the  current  sit¬ 
uation,  including  what  is  taking  place  in  the  chat 
rooms.  This  is  difficult  with  the  multiple  chat  rooms 
opened  simultaneously  and  new  messages  continu¬ 
ously  arriving.  Similarly,  Boiney  et  al.  (2008)  ob¬ 
served  that  with  US  Air  Force  operators,  when  they 
returned  to  duty  from  an  interruption,  another  oper¬ 
ator  in  the  same  room  verbally  updates  them  with 
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a  summary  of  what  had  recently  taken  place  in  the 
chat  rooms  and  where  they  can  find  the  important  in¬ 
formation.  Both  of  these  situations  are  motivations 
for  chat  summarization,  since  watchstanders  and 
operators  could  use  automatically  generated  sum¬ 
maries  to  quickly  orient  themselves  with  the  current 
situation. 

While  our  motivation  is  from  a  military  perspec¬ 
tive,  chat  summarization  is  also  applicable  to  other 
domains.  For  example,  chat  is  used  for  communica¬ 
tion  in  multinational  companies  (Handel  and  Herb- 
sleb,  2002),  open  source  meetings  (Shihab  et  al., 
2009;  Zhou  and  Hovy,  2005),  and  distance  learning 
(Osman  and  Herring,  2007).  Summarization  could 
aid  people  who  missed  meetings  or  students  who 
wish  to  study  past  material  in  a  summarized  format. 

Even  though  chat  summarization  has  many  poten¬ 
tial  uses,  there  has  been  little  research  on  this  topic 
(Section  3).  One  possible  reason  for  this  is  that  chat 
is  a  difficult  medium  to  analyze:  its  characteristics 
make  it  difficult  to  apply  traditional  NFP  techniques. 
It  has  uncommon  features  such  as  frequent  use  of  ab¬ 
breviations,  acronyms,  deletion  of  subject  pronouns, 
use  of  emoticons,  abbreviation  of  nicknames,  and 
stripping  of  vowels  from  words  to  reduce  number  of 
keystrokes  (Werry,  1996).  Chat  is  also  characterized 
by  conversation  threads  becoming  entangled  due  to 
multiple  conversations  taking  place  simultaneously 
in  multiparticipant  chat ,  i.e.,  chat  composed  of  three 
or  more  users  within  the  same  chat  room  (Herring, 
1999;  Herring,  2010).  The  interwoven  threads  then 
make  it  more  difficult  to  comprehend  individual  con¬ 
versations. 

The  rest  of  this  paper  describes  our  challenges 
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I-(  <SoLo>  Where  did  you  dig  up  that  old  fossil?  ) 

-(  <SkyWlkr>  Ben  is  a  great  man.  j 


<TheKirk>  Scotty  ...  I  need  more  power ! 
<Rplee>  BishOp,  how  much  time? 


<BishOp>  Plenty,  26  minutes! 


-(  <SoLo>  Yeah,  great  at  getting  us  into  trouble.  ) 

<Rplee>  We're  not  leaving! 

<BishOp>  We're  not?  j 

<Scott>  I '  m  giving  her  all  she's  got  captain ! 


Temporal 
,  Summarizer  / 


Temporal 

Summary 


Figure  1:  Process  for  generating  thread  and  temporal  summaries  from  a  chat  log. 


in  chat  summarization.  We  define  two  chat-related 
types  of  summarizations  we  are  investigating  (Sec¬ 
tion  2)  and  describe  related  work  (Section  3).  Fur¬ 
thermore,  we  give  an  overview  of  our  planned  ap¬ 
proach  to  these  challenges  (Section  4)  and  also  ad¬ 
dress  relevant  evaluation  issues  (Section  5). 

2  Our  Summarization  Challenge 

Our  research  goal  is  to  summarize  chat  in  real-time. 
Summaries  need  to  be  updated  with  every  new  chat 
message  that  arrives,  which  can  be  difficult  in  high- 
tempo  situations.  For  these  summarizations,  we 
seek  an  abstract,  compact  format,  allowing  watch- 
standers  to  quickly  situate  themselves  with  the  cur¬ 
rent  situation. 

We  are  investigating  two  types  of  summarization: 
thread  summaries  and  temporal  summaries.  These 
allow  a  user  to  actively  decide  how  much  summa¬ 
rization  they  need.  This  can  be  useful  when  a  user 
needs  a  summary  of  a  long,  important  conversation, 
or  when  they  need  a  summary  of  what  has  taken 
place  since  they  stopped  monitoring  a  chat  room. 

2.1  Thread  Summarization 

The  first  type  of  summarization  we  are  investigating 
is  a  thread  summary.  This  level  of  summarization 
targets  individual  conversation  threads.  An  exam¬ 
ple  of  this  is  shown  in  Figure  1,  where  a  summary 
would  be  generated  of  the  messages  highlighted  in 
green,  which  all  belong  to  the  same  conversation. 
An  example  output  summary  may  then  be: 

SoLo  and  SkyWlkr  are  talking  about 
Ben.  SkyWlkr  thinks  he’s  great,  SoLo 
thinks  he  causes  trouble. 


As  shown,  this  will  allow  for  a  summarization  to 
focus  solely  on  messages  within  a  conversation  be¬ 
tween  users.  A  good  summary  for  thread  summa¬ 
rization  will  answer  three  questions:  who  is  con¬ 
versing,  what  they  are  conversing  about,  and  what  is 
the  result  of  their  conversation.  With  our  example, 
the  summary  answers  all  three  questions:  it  identi¬ 
fies  the  two  speakers  SoLo  and  SkyWlkr,  it  identifies 
that  they  are  talking  about  Ben,  and  that  the  result  is 
SkyWlkr  thinks  Ben  is  great  while  SoLo  thinks  Ben 
causes  trouble. 

The  key  challenge  to  thread  summarization  will 
be  finding,  extracting,  and  summarizing  the  individ¬ 
ual  conversation  threads.  This  requires  the  ability 
to  detect  and  extract  threads,  which  has  become  of 
great  interest  in  recent  research  (Duchon  and  Jack- 
son,  2010;  Eisner  and  Charniak,  2010;  Eisner  and 
Schudy,  2009;  Ramachandran  et  al.,  2010;  Wang 
and  Oard,  2009).  Thread  disentanglement  and  sum¬ 
marization  will  have  to  be  done  online,  with  conver¬ 
sation  threads  being  updated  every  time  a  new  mes¬ 
sage  appears.  Another  challenge  will  be  processing 
incomplete  conversations,  since  some  messages  may 
be  incorrectly  classified  into  the  wrong  conversation 
threads.  These  issues  will  need  to  be  addressed  as 
this  research  progresses. 

2.2  Temporal  Summarization 

The  other  form  of  summarization  we  seek  is  a  tem¬ 
poral  summary.  We  want  to  allow  users  to  dynami¬ 
cally  specify  the  temporal  interval  of  summarization 
needed.  In  addition,  a  user  will  be  able  to  specify 
the  level  of  detail  of  the  summary,  which  will  be  ex¬ 
plained  further  later  in  this  section.  An  example  of 
a  user  selecting  a  temporal  summary  can  be  seen  in 
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Figure  1 .  A  summary  will  be  generated  of  only  the 
text  that  the  user  selected,  which  is  shaded  in  blue. 
An  example  output  summary  may  then  be: 

Rplee  and  BishOp  disagree  if  there  is 
enough  time  to  stay.  SoLo  and  SkyWlkr 
are  talking  about  Ben. 

A  good  summary  for  this  task  will  answer  the 
following  question:  what  conversations  have  taken 
place  within  the  specified  temporal  interval.  In 
some  cases  depending  on  the  user’s  preference,  not 
all  conversations  will  be  included  in  the  summary. 
When  not  all  conversations  arc  included,  then  a  good 
summary  will  consist  of  the  most  important  conver¬ 
sations  and  exclude  those  which  are  deemed  less  im¬ 
portant.  The  amount  of  detail  to  be  presented  for 
each  individual  conversation  will  be  determined  by 
the  temporal  interval  and  the  level  of  detail  requested 
by  the  user,  which  is  discussed  later  in  this  section. 

The  summaries  will  need  to  be  generated  after  a 
user  selects  the  temporal  interval.  To  aid  in  this,  we 
envision  that  the  summarizer  will  leverage  the  thread 
summaries.  Conversations  threads,  along  with  their 
abstracts,  will  be  stored  in  memory,  and  these  will 
be  updated  every  time  a  new  message  is  received. 
The  temporal  summarizer  can  then  use  the  thread 
summaries  to  generate  the  temporal  summaries. 

A  user  will  also  be  able  to  specify  the  level  of 
detail  in  the  summary  in  addition  to  the  temporal 
interval.  When  generating  a  temporal  summary,  a 
higher  level  of  detail  will  result  in  a  longer  summary, 
with  the  highest  level  of  detail  resulting  in  a  sum¬ 
mary  consisting  of  all  the  thread  summaries  within 
the  temporal  interval.  In  the  case  of  a  lower  level  of 
detail,  the  summarizer  will  have  to  determine  which 
threads  arc  important  to  include,  and  further  abstract 
them  to  create  a  smaller  summary.  The  benefit  of  al¬ 
lowing  the  user  to  specify  the  level  of  detail  is  so  that 
they  can  determine  how  much  detail  they  need  based 
on  personal  requirements.  For  example,  if  someone 
only  has  a  short  amount  of  time  to  read  a  summary, 
then  they  can  specify  a  low  level  of  detail  to  quickly 
understand  the  important  points  discussed  within  the 
temporal  interval  they  want  covered. 

Temporal  summaries  present  additional  chal¬ 
lenges  to  address.  The  primary  one  is  determining 
which  conversation  threads  to  include  in  the  sum¬ 
mary,  which  require  a  ranking  metric.  Additionally, 


there  is  an  issue  of  whether  to  include  a  conversation 
thread  if  all  messages  do  not  all  fall  within  the  tem¬ 
poral  interval.  For  example,  if  there  is  a  long  conver¬ 
sation  composed  of  many  messages,  and  only  one 
message  falls  within  the  temporal  interval,  should  it 
then  be  included  or  discarded?  These  issues  will  also 
need  to  be  addressed  as  this  research  progresses. 

2.3  Chat  Corpora 

An  additional  challenge  of  this  work  is  finding  a 
suitable  chat  coipus  that  can  be  used  for  testing  and 
evaluating  summarization  applications.  Most  chat 
corpora  do  not  have  any  summaries  associated  with 
them  to  use  for  a  gold  standard,  making  evaluations 
difficult.  This  evaluation  difficulty  is  described  fur¬ 
ther  in  Section  5. 

Currently,  we  arc  aware  of  two  publicly  available 
chat  logs  with  associated  summaries.  One  of  these  is 
the  GNUe  Traffic  archive1,  which  contains  human- 
created  summaries  in  the  form  of  a  newsletter  based 
primarily  on  Internet  Relay  Chat  (IRC)  logs.  Work¬ 
ing  with  these  chat  logs  requires  abstractive  (i.e., 
summaries  consisting  of  system-generated  text)  and 
extractive  (i.e.,  summaries  consisting  of  text  copied 
from  source  material)  applications  (Lin,  2009),  as 
the  summaries  arc  composed  of  both  human  narra¬ 
tion  and  quotes  from  the  chat  logs. 

The  other  corpus  is  composed  of  chat  logs  and 
summaries  of  a  group  of  users  roleplaying  a  fantasy 
game  over  IRC.2  The  summaries  are  of  an  abstrac¬ 
tive  form.  Creating  summaries  for  these  logs  is  more 
difficult  since  the  summaries  take  on  different  styles. 
Some  summarize  the  events  of  each  character  (e.g., 
their  actions  during  a  battle),  while  others  are  more 
elaborate  in  describing  the  chat  events  using  a  strong 
fantasy  style. 

3  Related  Work 

Summarization  has  been  applied  to  many  different 
media  (Lin,  2009;  Sparck  Jones,  2007),  but  only 
Zhou  and  Hovy  (2005)  have  worked  on  summariz¬ 
ing  chat.  They  investigated  summarizing  chat  logs  in 
order  to  create  summaries  comparable  to  the  human- 
made  GNUe  Traffic  digests,  which  were  described 
in  Section  2.3.  Their  approach  clustered  partial  mes- 

1  http :  //kt.  earth.  li/GNU  e/index  .html 

2http://www.bluearch.net/night/history.html 


3 


sages  under  identified  topics,  then  created  a  collec¬ 
tion  of  summaries,  with  one  summary  for  each  topic. 
In  their  work,  they  were  using  an  extractive  form 
of  summarization.  For  evaluation,  they  rewrote  the 
GNUe  Traffic  digests  to  partition  the  summaries  into 
summaries  for  each  topic,  making  it  easier  to  com¬ 
pare  with  their  system-produced  summaries.  Their 
approach  performed  well,  outperforming  a  baseline 
approach  and  achieving  an  F-score  of  0.52. 

There  has  also  been  work  on  summarization  of 
media  which  share  some  similarities  to  chat.  For 
example,  Zechner  (2002)  examined  summarization 
of  multiparty  dialogues  and  Murray  et  al.  (2005)  ex¬ 
amined  summarization  of  meeting  recordings.  Both 
of  these  media  share  in  common  with  chat  the  dif¬ 
ficulty  of  summarizing  conversations  with  multiple 
participants.  A  difference  with  chat  is  that  both  of 
these  publications  focused  on  one  conversation  se¬ 
quentially  while  chat  is  characterized  by  multiple, 
unrelated  conversations  taking  place  simultaneously. 
Newman  and  Blitzer  (2003)  described  the  beginning 
stages  of  their  work  on  summarizing  archived  dis¬ 
cussions  of  newsgroups  and  mailing  lists.  This  has 
some  similarity  with  conversations,  but  a  difference 
is  that  newsgroups  and  mailing  lists  have  metadata 
to  help  differentiate  the  threaded  conversations.  Ad¬ 
ditional  differences  between  chat  and  these  other 
media  can  be  seen  in  the  unusual  features  not  found 
in  other  forms  of  written  texts,  as  described  earlier 
in  Section  1. 

4  Planned  Approach 

We  envision  taking  a  three  step  approach  to  achieve 
our  goals  for  this  research.  We  will  abstract  this  to  a 
non-military  domain,  so  that  it  is  more  accessible  to 
the  research  community. 

4.1  Foundation 

The  first  step  is  to  focus  on  improving  techniques  for 
summarizing  chat  logs  in  general  to  create  a  founda¬ 
tion  for  future  extensions.  With  the  only  approach 
so  far  having  been  by  Zhou  and  Hovy  (2005),  it  is 
unknown  whether  this  is  the  best  path  for  chat  sum¬ 
marization,  nor  is  it  known  how  well  it  would  work 
for  real-time  chat.  Also,  since  its  publication,  new 
techniques  for  analyzing  multiparticipant  chat  have 
been  introduced,  particularly  in  thread  disentangle¬ 


ment,  which  could  improve  chat  summarization. 

We  hypothesize  that  constructing  an  approach  that 
incorporates  new  techniques  and  ideas,  while  ad¬ 
dressing  lessons  learned  by  Zhou  and  Hovy  (2005), 
can  result  in  a  more  robust  chat  summarizer  that  can 
generate  summaries  online.  A  paid  of  this  process 
will  include  examining  other  techniques  for  summa¬ 
rization,  drawing  on  ideas  from  related  work  dis¬ 
cussed  in  Section  3,  such  as  leveraging  latent  se¬ 
mantic  analysis  (Murray  et  ah,  2005).  Furthermore, 
we  will  incorporate  past  work  on  dialogue  act  tag¬ 
ging  in  chat  (Wu  et  al.,  2005)  to  both  improve  sum¬ 
marization  and  create  a  framework  for  the  next  two 
steps.  However,  there  is  one  limitation  with  their 
work:  the  templates  used  for  tagging  were  manually 
created,  which  is  both  time-intensive  and  fragile.  To 
overcome  this,  we  plan  to  use  an  unsupervised  learn¬ 
ing  approach  to  discover  dialogue  acts  (Ritter  et  ah, 
2010). 

4.2  Thread  Extension 

The  second  step  will  be  to  extend  summarization 
to  thread  summaries.  This  will  require  leveraging 
thread  disentanglement  techniques,  with  the  possi¬ 
bility  of  using  multiple  techniques  to  improve  the  ca¬ 
pability  of  finding  whole  conversation  threads.  For 
the  summary  generations,  we  will  first  create  extrac¬ 
tive  summaries  before  extending  the  summarizer  to 
generate  abstractive  summaries.  In  addition,  we  will 
address  the  problem  of  incomplete  conversations  for 
the  cases  when  not  all  messages  can  be  extracted 
correctly,  or  when  not  all  the  messages  of  a  conver¬ 
sation  arc  available  due  to  joining  a  chat  room  in  the 
middle  of  a  conversation. 

Another  task  will  be  the  creation  of  a  suitable  cor¬ 
pus  for  this  work.  As  discussed  in  Section  2.3,  there 
arc  only  two  known  corpora  with  associated  sum¬ 
maries.  Neither  of  these  corpora  are  well  suited 
for  thread  summarization  since  the  summaries  arc 
not  targeted  towards  answering  specific  questions 
(see  Section  2.1),  making  evaluations  difficult.  We 
plan  on  creating  a  corpus  by  extending  an  exist¬ 
ing  thread  disentanglement  corpus  (Eisner  and  Char- 
niak,  2010).  This  corpus  consists  of  technical  chat 
on  IRC  related  to  Linux,  and  has  been  annotated  by 
humans  for  conversation  threads.  We  will  expand 
this  corpus  to  include  both  extractive  and  abstractive 
summaries  for  each  of  the  threads.  The  advantage 
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of  using  this  corpus,  beyond  the  annotations,  is  that 
it  is  topic-focused,  which  is  a  closer  match  of  what 
one  would  expect  to  see  in  the  military  domain  com¬ 
pared  to  social  chat. 

4.3  Temporal  Extension 

The  third  and  final  step  will  be  to  extend  summa¬ 
rization  to  temporal  summaries.  The  key  point  of 
this  will  be  to  extend  the  summarization  capability 
so  that  a  user  can  specify  the  level  of  detail  within 
the  summary,  which  will  then  determine  the  length 
of  the  summary  and  how  much  to  include  from  the 
thread  summaries.  This  will  then  involve  creating  a 
ranking  metric  for  the  different  conversations.  Un¬ 
like  the  thread  extension,  no  additional  abstraction 
will  be  needed.  Instead,  the  temporal  extension 
will  reuse  the  thread  summaries,  and  reduce  their 
length  by  ranking  the  sentences  within  the  individ¬ 
ual  summaries  as  done  with  traditional  text  summa¬ 
rization.  Additionally,  the  problem  of  conversation 
threads  containing  messages  both  inside  and  outside 
the  temporal  interval  will  need  to  be  addressed. 

As  with  the  thread  extension,  a  corpora  will  need 
to  be  created  for  this  work.  We  expect  that  this  will 
build  on  the  corpora  used  for  the  thread  extension. 
This  will  then  require  additional  summaries  to  be 
created  for  different  levels  of  temporal  intervals  and 
detail.  To  make  this  task  feasible,  we  will  restrict  the 
number  of  possible  temporal  intervals  and  levels  of 
detail  to  only  a  few  options. 

5  Evaluation  Issues 

A  major  issue  in  summarization  is  evaluation 
(Sparck  Jones,  2007),  which  is  also  a  concern  for 
this  work.  One  problem  for  evaluation  is  the  lack  of 
suitable  gold  standards,  as  described  in  Section  2.3. 
Another  problem  is  that  we  plan  on  working  with 
abstractive  forms  in  the  future. 

For  the  foundation  step,  we  can  follow  the  same 
procedures  as  Zhou  and  Hovy  (2005),  which  would 
allow  us  to  compare  our  results  with  theirs.  This 
would  restrict  the  work  to  only  an  extractive  form 
for  comparisons,  though  it  is  possible  to  extend  to 
abstract  comparisons  due  to  the  gold  standards  being 
composed  of  both  extractive  and  abstractive  means. 

Evaluation  for  the  thread  and  temporal  extensions 
will  require  additional  work  due  to  both  the  lack 


of  suitable  gold  standards  and  our  need  for  abstrac¬ 
tive  summaries  instead  of  extractive  summaries.  The 
evaluations  will  include  both  intrinsic  (i.e.,  how  well 
the  summarizer  is  able  to  meet  its  objectives)  and 
extrinsic  evaluations  (i.e.,  how  well  the  summaries 
allow  the  user  to  perform  their  task,  Sparck  Jones 
(2007)).  For  the  intrinsic  evaluations,  we  will  use 
both  automated  techniques  (e.g.,  ROUGE3)  and  hu¬ 
man  assessors  for  evaluating  both  the  thread  and 
temporal  summarizations.  Some  concerns  for  evalu¬ 
ation  is  that  with  the  thread  summaries,  evaluation 
will  be  impacted  by  how  accurately  conversation 
threads  can  be  extracted.  With  the  temporal  sum¬ 
maries,  the  temporal  intervals  and  the  level  of  detail 
determines  the  length  and  detail  of  the  summary. 

For  the  extrinsic  evaluations,  this  research  will 
be  evaluated  as  part  of  a  larger  project,  which  will 
include  human  subject  studies.  Subjects  will  be 
situated  in  a  simulated  watchstander  environment, 
must  monitor  three  computer  monitors  simultane¬ 
ously  (one  of  which  will  contain  live  chat)  while 
also  listening  to  radio  communications.  Testing  of 
our  chat  summarization  methods  will  be  done  in  col¬ 
laboration  with  testing  on  3D  audio  cueing  to  inves¬ 
tigate  and  evaluate  whether  these  technologies  can 
help  watchstanders  combat  information  overload. 

6  Conclusion 

We  have  presented  the  challenges  we  face  in  chat 
summarization.  Our  goal  for  this  research  is  that  it 
will  result  in  a  robust  chat  summarizer  which  is  able 
to  generate  abstract  summaries  in  real-time.  This  is 
a  difficult,  exciting  domain,  with  many  possible  ap¬ 
plications.  We  have  shown  that  the  difficulties  arc 
due  to  the  chat  medium  itself,  lack  of  suitable  data, 
and  difficulties  of  evaluation. 
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